A method for parameter hypothesis testing in nonparametric regression with Fourier series approach

Nonparametric regression model with the Fourier series approach was first introduced by Bilodeau in 1994. In the later years, several researchers developed a nonparametric regression model with the Fourier series approach. However, these researches are limited to parameter estimation and there is no research related to parameter hypothesis testing. Parameter hypothesis testing is a statistical method used to test the significance of the parameters. In nonparametric regression model with the Fourier series approach, parameter hypothesis testing is used to determine whether the estimated parameters have significance influence on the model or not. Therefore, the purpose of this research is for parameter hypothesis testing in the nonparametric regression model with the Fourier series approach. The method that we use for hypothesis testing is the LRT method. The LRT method is a method that compares the likelihood functions under the parameter space of the null hypothesis and the hypothesis. By using the LRT method, we obtain the form of the statistical test and its distribution as well as the rejection region of the null hypothesis. To apply the method, we use ROA data from 47 go public banks that are listed on the Indonesia stock exchange in 2020. The highlights of this research are:• The Fourier series function is assumed as a non-smooth function.• The form of the statistical test is obtained using the LRT method and is distributed as F distribution.• The estimated parameters on modelling ROA data have a significant influence on the model.


Method details
The purpose of this research is to develop a method for parameter hypothesis testing in nonparametric regression with the Fourier series approach using the Likelihood Ratio Test (LRT) method and we apply the method to Return on Asset (ROA) data of 47 go public banks on the Indonesia stock exchange in 2020.Based on the hypothesis testing that will be carried out, we will obtain the formula of the statistical test and its distribution as well as the rejection region for the null hypothesis.The method details used in this research are given as follows.

The model and its estimation
The nonparametric regression model with Fourier series approach Regression analysis is part of the statistical method used to model the relationship between predictor and response variables.Suppose   is the predictor variable and   is the response variable on the  ℎ observation with  = 1 , 2 , ...,  , the relationship between (   ,   ) could be expressed as follows.
where  is the regression curve and   is the error term which we assumed to be normally distributed with mean 0 and the constant variance of  2 .In regression analysis, there are several approaches for the model (1) , namely the parametric regression model and the nonparametric regression model [1] .If we assume  as a known function, then model (1) could be approached using parametric regression.However, if we assume  as an unknown function, then the model (1) could be approached using nonparametric regression.
The assumption of  as a known or unknown function could be seen by using a scatterplot [2] .In this research, we are assuming  as an unknown function.Therefore, the model ( 1) is a nonparametric regression.Nonparametric regression is a regression approximation which is not bound by the assumption that the shape of the regression curve is known and has flexible properties as the function  could adapt to the nature of the local data.Since  is a nonparametric function,  could be approached using one of the nonparametric estimators.The estimator which could be used to approximate  is the Fourier series function.The Fourier series is a trigonometric polynomial containing cosine and sine functions, which Joseph Fourier first introduced.In 1977, Jong was the first researcher to conduct research related to the Fourier series which discusses the transformation of the Fourier series for smoothing of the density function in the spectral estimator [3] .In the later years, it followed by several researchers, with the form of Fourier series given as of  (  ) =  2 + ∑   =1 (   cos (  ) +   sin (  ) ) [4][5][6][7] .However, Bilodeau in 1992 developed the Fourier series function for a smoothing model in nonparametric regression by modifying the function.Bilodeau modifies the function by using the cosine functions only and adds  as a trend into the Fourier series function [8] .Therefore, the Fourier series function becomes  (  ) = 1  2  +  + ∑   =1   cos (  ) .This type of the Fourier series function was developed and used in a nonparametric regression model [see, [9][10][11][12].The advantage of using the Fourier series function in nonparametric regression is being able to handle data that has a recurring trend at certain intervals and has a good statistical interpretation.
The nonparametric regression model given in (1) is the nonparametric regression that contains only one predictor variable (univariable model).In this research, we present the number of the predictor variables as  predictor variables (multivariable model).Suppose  is the number of observations and  is the number of the predictor variables, the relationship of the predictor variables and a response variable (   1 ,   2 , ...,   ;   ) assumed to follow the nonparametric regression model as follows.
If we assume that all the predictor variables are independent or in other words between  1 ,  2 , ...,   are not correlated, then model (2) could be written in additive model form as follows.
Since ( 2) is the nonparametric regression model then   are unknown nonparametric regression curves.Let   are a continuous function, where   ∈ ( 0 , ) , then the function of   could be approximated with the Fourier series function [8] .
In general, for  = 1 , 2 , ...,  , the nonparametric regression model is given in Eq. ( 3) could be written in matrix and vector form as follows.

Parameter estimation
To obtain the estimation of the regression curve of f is equivalent to obtaining the estimation of the parameters.As in many nonparametric regression models, there are many methods to obtain the estimation of the parameters such as the Penalized Least Square (PLS) method if the regression curve of f are assumed to be a smooth function [ 8 , 10 , 13 , 14 ].However, if the regression curve of f is assumed to be only an unknown function and presented as a linear model as of ( 7) , then we could use the Ordinary Least Square (OLS) method that minimizes the sum of the square of the error.By using the optimization of the OLS method, the parameter estimation of B could be obtained as follows [ 12 , 15-17 ]. where

Curve estimation and model selection
Based on Eq. ( 6) and ( 8) , we obtain the estimation curve of the nonparametric regression with the Fourier series approach.Noted f is the regression curve and B is the parameter in the model which we estimated by B as of Eq. (8) .Therefore, the estimation of the regression curve of f is f as follows. where The estimation of the regression curve (10) is in matrix and vector form.In general, if we make an analogy by Eq. ( 6) and (4) , then the estimation of the regression curve in nonparametric regression with the Fourier series approach could be written as follows.
Furthermore, the parameter estimation of B which we obtain using the OLS method in (8) contains an unknown parameter, namely the oscillation parameter of .In nonparametric regression with the OLS method, there is always one unknown parameter such as knot in the Spline function, bandwidth in the Kernel function, and oscillation parameter in the Fourier series function.Therefore, to obtain the best parameter estimation which fits into the model is the same way as obtaining the optimum number of  in the Fourier series function.The method which could be used to obtain the optimum number of  is the Cross Validation (CV) or the Generalized Cross Validation (GCV) method.The GCV method has been developed by many researchers [see, 18 -19 ] for the Spline function and Bilodeau for the Fourier series function [8] .The GCV formula for choosing the optimum number of the oscillation parameter  is given as follows (the optimum number of  is obtained based on the minimum value of the GCV method):

Parameter hypothesis testing using the LRT method
Parameter hypothesis testing plays an important role in modelling and is part of the statistical inference which is essential in regression analysis.Parameter hypothesis testing is used to determine whether the estimated parameters have a significant influence on the model or not.In nonparametric regression model with the Fourier series approach, parameter hypothesis testing has not been carried out previously.Referring to previous researches, this model was used by several researchers for modelling or even for prediction in various fields/data [see, 12 , 15-17 ].However, these researches were only focused on estimation and modelling.Therefore, it is essential to develop a method for parameter hypothesis testing in a nonparametric regression model with the Fourier series approach.One of the methods could be used for parameter hypothesis testing is the LRT method.The LRT method is a method that compares the goodness of fit of two different models (the model under the null hypothesis and the model under the hypothesis).This method is widely used for parameter hypothesis testing in many regressions analysis [see, [20][21][22].
According to Casella and Berger, the LRT method for hypothesis testing is related to Maximum Likelihood Estimation (MLE) [23] .Let  1 ,  2 , ...,   are random samples from a population with the Probability Density Function (PDF) of  (  |) , where  is a parameter (  may also be a vector), then the likelihood function could be defined as follows.
Definition 1. Suppose Θ is the parameter space, then the LRT statistic to test where  ( | ) is the likelihood function with the parameter of  and the LRT rejected the null hypothesis in the region of {  1 ,  2 , ...,   |Λ ≤  } where  is any constant number with 0 ≤  ≤ 1 .Suppose μ is the parameter estimation under the parameter space of Θ and μ0 is the estimation parameter under the parameter space of Θ 0 which both are obtained by MLE and maximize the likelihood function.Therefore, the LRT in ( 12) could be written as follows.
The hypothesis form and its parameter space As the central objective of this research is to develop a method for parameter hypothesis testing in nonparametric regression with the Fourier series approach, the initial step involves the formulation of the hypothesis.Suppose the hypothesis form is given as follows.
The hypothesis given in ( 14) is a form of hypothesis which tested two different models (a model without parameters and a model containing at least one of the parameters).Mathematically, the hypothesis form in ( 14) could be written in the following form.
Under the assumption of model ( 2) , we know that   are normally distributed with mean 0 and the constant variance  2 , the PDF of Based on (3) , where   =   − ∑  =1   (   ) with   are Fourier series function, we obtain the likelihood function of ( 16) as follows.
MethodsX 11 (2023) 102468 Suppose  is the parameter space under the null hypothesis and Ω is the parameter space under the hypothesis.Based on the hypothesis form in (14) and the likelihood function (17) , then we could define the parameter space under the null hypothesis is  0 (  ) and the parameter space under the hypothesis is (Ω) as follows.The statistical test Based on Definition 1 , the statistical test for testing the hypothesis form given in ( 14) could be obtained by comparing the maximum likelihood under the parameter space of the null hypothesis (  ) and the parameter space under the hypothesis ( Ω) which is given in Theorem 1 .However, before presenting Theorem 1 , let's first introduce Lemma 1 and Lemma 2 .Lemma 1 provides a summary of how to obtain the maximum likelihood under the parameter space of the null hypothesis (  ) and Lemma 2 provides a summary of how to obtain the maximum likelihood under the parameter space of the hypothesis ( Ω). Lemma 1. Suppose  is the parameter space under the null hypothesis (18) then the maximum of the likelihood function (17) is where σ2  = ỹ ′ ỹ  .
Proof.In this case, the parameter space of  only contains the variance since we define all the parameters under the null hypothesis to be zero value.By the likelihood function (17) and the parameter space of  (18) , we obtain the likelihood function under the parameter space of  as follows.
where ỹ is a vector of the response variable.Furthermore, to obtain the maximum of ( 20) we estimate the parameter of  2  by completing  ln  (  ) 20) is given as follows.
ln  (  ) = ln The partial derivative of ( 21) with respect to  2  and equalized to zero as follows.
Therefore, by (22) we obtain the estimation of  2  is σ2  as follows.
Since  2  is estimated by σ2  (23) , the maximum of the likelihood function under the parameter space of  is Suppose Ω is the parameter space under the hypothesis (18) then the maximum of the likelihood function ( 17) is Proof.Noted that the parameter space of Ω is containing all the parameters in the model (the full model).Under the parameter space of Ω, the likelihood function (17) could be defined as follows.
Since   are unknown functions and we approximated by the Fourier series function (4) , the likelihood function (25) becomes The likelihood function under the parameter space of Ω (26) could be maximized by obtaining the estimation of B Ω and  2 Ω .The estimation of B Ω which easily obtained by completing  ln  (Ω)  B Ω = 0 as follows.
The estimation of  2 Ω which is the same way as we obtain  2  in Lemma 1 by completing  ln  (Ω) 2 Ω = 0 as follows.
ln  ( Ω) By giving fixed B Ω or in other words submitting B Ω (27) into σ2 Ω (28) , we finally obtain the estimation of  2 Ω is σ2 Ω as follows.
Therefore, by submitting B Ω (27) and σ2 Ω (29) into the likelihood function (26) , we obtain the maximum of the likelihood function under the parameter space of Ω as follows.
) .□ Furthermore, the statistical test for testing the hypothesis form in ( 14) could be obtained using the LRT method as of Theorem 1 .

Theorem 1.
Given the nonparametric regression model (3) with   approximated by the Fourier series function (4) , by using the LRT method, the statistical test for testing the hypothesis in ( 14) is where with  1 and  2 are given in Theorem 2 and  * = (  − 2  − 1 ) Proof.Since the error in (3) is assumed to be normally distributed with mean 0 and the constant variance  2 and given the hypothesis in (14) with  is the parameter space under the null hypothesis and Ω is the parameter space under the hypothesis.Therefore, by Definition 1 of the LRT in (13) we obtain By Lemma 1 and Lemma 2 , the LRT in (31) becomes Since σ2  and σ2 Ω are given in ( 23) and (29) , we obtain the LRT in (32) as follows.
The component of ỹ ′ ỹ in (33) could be described as follows.
Let  1 and  2 are the degrees of freedom which are given later in Theorem 2 .By multiplying for both segments in (36) , we obtain the statistical test for the hypothesis in ( 14) as follows.

□
The distribution of the statistical test and the rejection region.
The form of the statistical test that we obtain in Theorem 1 is for testing the hypothesis form in (14) .To determine whether the null hypothesis presented in ( 14) is rejected or fails to be rejected by using the statistical test (30) , we need to establish the rejection region for the null hypothesis by determining the distribution of the statistical test.The distribution of the statistical test of Λ * is given in Theorem 2 .However, to support the proof of Theorem 2 , it is necessary to simplify the statistical test of Λ * as provided in Corollary 1 below.

Corollary 1.
The statistical test of Λ * presented in Theorem 1 could be simplified as follows .
where  ( ) is given in Eq. ( 10) and  ( ) =  −  ( ) .Operating expenses and operating revenue  5 Loan to deposit Based on (43) and (44) the matrix  ( ) is symmetric and idempotent, thus . Since  is the identity matrix with the dimension of  ×  and followed by  1 in (i), then we obtain: Based on (i) and (ii) we have proved that ) ) as well as (iii) where  ( ) and  ( ) are independent.Therefore, the statistical test of Λ * given in Theorem 2 followed by Corollary 1 is distributed of  (  1 , 2 ) as follows.

Data source and analysis steps
We use secondary data to apply the method.The data we use is ROA data that we collected from the annual reports of 47 go public banks in 2020.The 47 go public banks are the banks that carry out stock trading on the Indonesia stock exchange, the list of the 47 go public banks could be seen in Table A1 (see Appendix A ).We use ROA data as the response variable (  ) and 5 predictor variables (  ), the detail of the variables are described in Table 1 .

Data analysis steps:
1. Create a scatterplot between the response and all the predictor variables.2. Assume the relationship between the response and all the predictor variables follow the nonparametric regression model with the Fourier series approach.3. Choose the optimum number of when the number of  is the same for all the predictor variables by using the GCV method (11).4. Choose the optimum number of when the number of  is different for each predictor variable by using the GCV method (11). 5. Create the best model between step 3 and 4 based on the smallest GCV value.6. Estimate the parameters.7. Create the hypothesis form for testing the parameters.8. Calculate the statistic value based on the statistical test and the probability value based on the distribution of the statistical test.9. Compare the value between the probability value and the significance level of .In this research, we use the R programming language for the exploration and analysis of ROA data.To streamline the implementation of the method detailed in this research, we have developed a package (syntax), which was created using R-Studio.This package encompasses the data analysis steps described in this research.We have made this package publicly available to facilitate its application to various datasets.The package can be accessed through the following link ( https://rpubs.com/Authorsdataanalysis/1104036 ).

Application on ROA data
We create a scatterplot for each predictor variable versus the response variable to identify that the relationship between the response variable and each predictor variable follow the nonparametric regression model, the scatterplot is given in Fig. 1 as follows.
Based on Fig. 1 , we could observe the relationship patterns between the response variable and the predictor variables  1 ,  2 ,  3 ,  5 don't exhibit specific patterns, while  4 exhibits a tendency toward linearity.However, the relationship patten of  4 as linear could not be definitively established without further analysis.To address this, we conducted an analysis for modelling  4 using linear parametric regression and we obtained  2 of 74.66 % with MSE of 1.898.We also conducted a comparison using nonparametric regression with the Fourier series approach.For the number of  is 1, we obtained  2 of 75.52 % with MSE of 1.834.Moreover, we conducted a trial by setting the maximum number of  is 10 and we obtained the optimum number of  is 7 based on the minimum GCV value of 1.803 with  2 of 84.26 % and MSE of 1.179.Based on the results of the trial analysis for modelling  4 using linear parametric regression and nonparametric regression with the Fourier series approach, we could conclude that, even when initially assessed through scatterplot,  4 demonstrates a tendency linearity.However, upon conducted a further analysis,  4 is better to be modelled using nonparametric regression with the Fourier series, even when the number of  is 1 or 7 for the maximum number of  is 10 (this could be seen by  2 and MSE values).Therefore, in this research, we chose to model the ROA data using nonparametric regression with the Fourier series approach for all the predictor variables.
Since the Fourier series function depends on the number of , then we use the GCV method to obtain the optimum number of .In this research, we carried out several trials related to the number of , including when the number of  is the same for all the predictor variables and the number of  is different for all the predictor variables.In the analysis we conducted, we use the maximum number of  is 5 and we obtained the values of GCV,  2 , and MSE for the number of  is the same for all the predictor variables in Table 2 .
We obtain the minimum GCV value of 1.772 which means that the optimum number of  is 2.Although the value of  2 is highest for the number of  is 5, however the GCV value of 2.507 is highest than the GCV value for the number of  is 2. Therefore, the   best estimation model for ROA data by using nonparametric regression with the Fourier series approach is when the number of  is 2 for all the predictor variables.Moreover, we conducted several combinations for the number of  on each predictor variable by taking the maximum number of  is 2, 3, 4, and 5.For the maximum number of  is 2 we have 32 combinations,  is 3 we have 243 combinations,  is 4 we have 1024 combinations, and  is 5 we have 3125 combinations.Based on the results of the analysis, we obtain the minimum of the GCV value as well as the  2 and the MSE value for the combination number of  on each predictor variable when the maximum number of  is 2, 3, 4, and 5.
In Table 3 we only show the optimum combination number of  based on the minimum of the GCV value when the maximum number of  is 2, 3, 4, and 5.For example, we take the maximum number of  is 2, then we have 32 combinations for the number of  on all the predictor variables and we obtain the minimum GCV value of 1.537 for the combination number of  is 1 for  1 , 2 for  2 , 1 for  3 , 2 for  4 , and 1 for  5 .Based on the possibilities for the combination number of , we obtain the minimum GCV value of 1.375 when the maximum number of  is 5 with the combination number of  is 5 for  1 , 2 for  2 , 3 for  3 , 2 for  4 , and 1 for  5 .Based on the combination number of  on each predictor variable, then we obtain the general form of the nonparametric regression model with the Fourier series approach for ROA data in 2020 as follows.Furthermore, we obtain the estimation of all the parameters in the model (46) as follows.
After obtaining the estimation of all the parameters in Table 4 , we conducted a parameter hypothesis testing to determine whether all the parameters we estimated have a significant influence on the model (46) .Based on the model (46) and the hypothesis form (14) we could define the null hypothesis and the alternative hypothesis for testing the parameters in the model (46) as follows.This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Fig. 1 .
Fig. 1.Scatterplots of the response variable and the predictor variables.

Table 1
Variable description.

Table 2
GCV values for the number of  is the same for all the predictor variables.

Table 3
Minimum GCV values for the optimum combinations of .

Table A1
List of 47 go public banks.
( continued on next page )